On Seeking Consensus Between Document Similarity Measures
نویسندگان
چکیده
منابع مشابه
On Seeking Consensus Between Document Similarity Measures
This paper investigates the application of consensus clustering and meta-clustering to the set of all possible partitions of a data set. We show that when using a ”complement” of Rand Index as a measure of cluster similarity, the total-separation partition, putting each element in a separate set, is chosen.
متن کاملMultilevel Measures of Document Similarity
Many applications such as document summarization, passage retrieval and question answering require a detailed analysis of semantic relations between terms within and across documents and sentences. Often one has a number of sentences or paragraphs and has to choose the candidate with the highest level of relevance for the topic or question. An additional requirement may be that the information ...
متن کاملInvestigating Measures for Pairwise Document Similarity
The need for a more effective similarity measure is growing as a result of the astonishing amount of information being placed online. Most existing similarity measures are defined by empirically derived formulas and cannot easily be extended to new applications. We present a pairwise document similarity measure based on Information Theory, and present corpus dependent and independent applicatio...
متن کاملSimilarity Measures for Text Document Clustering
Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets....
متن کاملDocument Representation and Multilevel Measures of Document Similarity
We present our work on combining largescale statistical approaches with local linguistic analysis and graph-based machine learning techniques to compute a combined measure of semantic similarity between terms and documents for application in information extraction, question answering, and summarisation.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Fundamenta Informaticae
سال: 2017
ISSN: 0169-2968,1875-8681
DOI: 10.3233/fi-2017-1597